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Chem. Soc, 128, 3228 (2006)] allows continuous optimization in discrete chemical space and thus is quite 
useful in the design of molecules for targeted properties. To address further challenges arising from the 
rugged, continuous property surfaces in the LCAP approach, we develop a gradient-directed Monte Carlo 



^ ■ (GDMC) strategy as an augmentation to the original LCAP optimization method. The GDMC method 

Ph: 

retains the power of exploring molecular space by utilizing local gradient information computed from the 
Qj • LCAP approach to jump between discrete molecular structures. It also allows random Monte Carlo moves to 

O , overcome barriers between local optima on property surfaces. The combined GDMC and LCAP approach is 

Q ■ demonstrated here for optimizing nonlinear optical (NLO) properties in a class of donor-acceptor substituted 



benzene and porphyrin frameworks. Specifically, one molecule with four nitrogen atoms in the porphyrin 
ring was found to have a larger first hypeipolarizability than structures with the conventional porphyrin 
motif. 



I. INTRODUCTION 

Molecular and material design presents an extremely challenging problem because the num- 
ber of accessible stable candidate molecules in the sampling space is immense^'^'^. In such a 
vast molecular space, direct enumeration and evaluation of molecules are expensive and ineffi- 
cient. Furthermore, since each molecular structure in the space is unique and discrete, traditional 
methods are usually based on combinatorial chemistry. Each molecule is generated from building 
blocks and its specific chemical properties are analyzed for selection in the design process. Thus, 
a number of discrete optimization approaches, such as branch-and-bound methods'^, Monte Carlo, 
and genetic algorithms'"^, have been utilized to explore the rich molecular space for the desired 
chemical properties. All of these approaches require extensive molecular property computations to 
obtain some optimal structures. To overcome the difficulties associated with discrete optimization, 
several approaches^^^ have been proposed to transform the discrete optimization into a continuous 
optimization. However, none of these methods can be generalized to optimize the desired prop- 
erties for molecular design. Recently, we introduced a linear combination of atomic potentials 
(LCAP) scheme^''. The LCAP method provides a smooth chemical property surface interpolating 
those of the discrete target molecules, thus facilitates property optimization for molecular design. 
A related approach has also been suggested but was based on atom type variation in a fixed chem- 
ical skeletoni^. 

A molecule or solid is characterized by its electron-nuclear potential 

''W = -E];\ (1) 

R I* ^1 

where Zr is the atomic number at position R. This electrostatic potential and the number of 
electrons A^, enter the Schrodinger equation and determine the properties of the molecules. As 
such, molecular or material properties are functionals of v(r) and A^ . 

In the LCAP approach, we thus formulate the design of optimized molecules or materials as the 
optimization of properties in the functional space of v(r) and A^— . The advantages of optimization 
based on the potential arise from both the potential's "smoothness" and the favorable scaling of the 
computational cost with system size. The complexity of the potential function grows linearly with 
the molecular size. This is in stark contrast to the combinatorial explosion of possible molecular 
structures that would fill a growing molecular volume. The challenge at hand is how best to carry 
out the potential-function optimization; it is essential that the optimized potential be linked to real 
molecules. While all molecules lie within the space of all v(r), not all potentials map back to 



chemical structures. A full optimization in potential space most likely will lead to a potential that 
does not map to a molecule, which has a limited form of the potential given by eq. ©• 

The LCAP method is an effective way to limit the scope of the optimization in potential space. 
Specifically, the potential is expanded as a linear combination of atomic potentials, 

v(r) = £Z7M(r) (2) 

where v^(r) is the potential of an atom (or chemical group) A at position R, v^(r) = EfiV5(r), 
VB(r) = — I J^i . The coefficient b^ defines the admixture of a particular atom or group in the 
molecular potential. The constraints on b^ are Y^a^a ~ ^ ^^^ < b^ < 1. When b^ = or 
b^ = I for each R, v(r) corresponds to a "real" molecule. Therefore, within the LCAP framework, 
the design of molecules with an optimized targeted property becomes the optimization of the 
coefficients {b^} for given sets of R (geometry) and v^ (atom or group types). The LCAP scheme 
has been used to find the potential function that produces structures with optimal properties, such 
as electronic polarizabilities and first hyperpolarizabilities for the simple case with two substitution 
sites. We showed that the optimal structures can be determined without directly enumerating all 
of the possible structures. We further implemented the LCAP scheme in the tight-binding and 
semi-empirical framework s ^^i^^. Our studies showed that the linear combination of atomic/group 
coefficients in LCAP indeed smooths out the discreteness of the molecular space and facilitates 
continuous optimization. 

However, when molecular diversity grows (e.g., as chemical groups of varying structure, num- 
ber of electrons, etc. enter), the ruggedness of the property surfaces grows as well. The rugged sur- 
face makes the property optimizations inefficient, since the optimization becomes easily trapped 
in local minima. In addition, a fractional number of electrons may appear during the continuous 
optimization process, which can lead to convergence difficulty for the property calculations when 
using quantum mechanical approaches. 

In this paper, we develop an optimization approach, the gradient-directed Monte Carlo 
(GDMC) approach, which retains the convenience of continuous optimization while circumvent- 
ing some of the difficulties associated with the ruggedness of property surfaces. The method 
utilizes local gradient information with respect to the LCAP coefficients {b^}, but preserves the 
simplicity of exploring molecular space by jumping between discrete molecular structures. It also 
allows random Monte Carlo jumps to overcome barriers between local property minima. As such, 
the GDMC method avoids the surface ruggedness that arises from chemical structures of "inter- 
mediate" composition (i.e., structures with non-integer LCAP coefficients; see Ref.— for further 



information). Note that the application GDMC in conjunction with using finite difference approx- 
imation to the gradients was reported in Ref. Keinan et al. ^^, without a description of algorithmic 
details. The related LCAP application to locate the deepest minimum energy configuration of face 
centered cubic Au - Pd alloys has also been reported. ^^ In a related manner, the trial wave func- 
tion in a fixed-node quantum Monte Carlo method^^ "serves as a guiding function for a random 
walk of the electrons through configuration space", which is indeed a biased MC approach for the 
continuous electron coordinate space. This is similar with our proposed GDMC approach for the 
discrete molecular space. 

We apply the combined GDMC and LCAP approaches to optimize the static first electronic 
hyperpolarizability /3 for molecular libraries. In SectionHIl the LCAP approach for the first hyper- 
polarizability is explained briefly. In Section Iflll the general GDMC algorithm is introduced for 
molecular property optimization. Section |IV] describes how to combine GDMC with the LCAP 
method to optimize the first electronic hyperpolarizability. Three examples of hyperpolarizability 
optimization are presented in Section [V] Finally, Section |VI] summarizes the combined GDMC 
and LCAP approaches. 

II. THE LCAP APPROACH FOR THE FIRST HYPERPOLARIZABILITY 

Traditional molecular design is carried out by examining families of discrete structures, which 
may be generated using combinatorial libraries, intuition, or qualitative structure-activity rela- 
tionships. But for any molecule, the Schrodinger equation is determined by the nuclear-electron 
attraction potential v(r) and the number of electrons A^. Optimization of v(r), as developed in our 
recent studies^, is appealing because continuous search algorithms based on gradient calculations 
can be much more efficient than traditional molecular design methods. The discrete molecular 
optimization problem becomes a continuous one when v(r) is expanded as a linear combination 
of atomic potentials (LCAP), as in Eq.([2l). 

In this paper, we are interested in the linear and nonlinear response properties to an external 
electric field. In an electric field F, the molecular energy is 

£(F) = E{0)-'£^,F,-^'£aijFiFj 

^- ij 

- l^LP'jkm^k - ^ E YukiF^FjFkF, -■■■ (3) 

ijk ' ijkl 

where E{0) is the energy in the absence of the applied electric field, the indices /, j, k, and / 
span the molecular (x, y, and z) axes, /i,- is the permanent dipole moment and aij is the (linear) 



polarizability. The first hyperpolarizability is ^ijk and the second hyperpolarizability is yij^i. We 
focus on the static first hyperpolarizability in this paper. Specifically, the first hyperpolarizability 
tensor element jS^^^ along the z-axis can be calculated using the finite-field approacb^^i^: 



Pz. 



^{E{2K)-E{-2F,)} + {E{F,)-E{-F,)} 



1^1 



where F^ is the applied electric field along the z-axis and ^(K) is the corresponding energy. 
In the LCAP approach, the jS^^j- gradients with respect to the LCAP coefficients {Z?J} are. 
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as implemented^*^ in the PWSCF^^ program based on plane waves with ultra-soft pseudopotentials 
(USPP)i^. In PWSCF, for atom A, vj(r) is replaced by the USPP, 

VA(r,rO = v^(r,r')+v^(r,rO 

= vrW5(r-r')+i:<|/3„^)(j8^| (6) 

nm 

where D^,^^, /3,f and v^°'(r) characterize the USPP for atom A. The electrostatic potential v(r) can 
be expanded as a linear combination of atomic potentials in the USPP framework. 



v(r,rO = £Z,K^'--(r)5(r-r') + i:Z.K''^'(r,rO 
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where the first term is the total local potential and the second term is the total nonlocal potential of 
v(r) for the system. The total energy of the system in an external electric field can then be written 

h 



E{F,) =l^U 



-^V^-^v"'(r,rO 



(l)i)+EH[n{r) 



where 



+ E,,[n{r)]+Jdr(v'"'{r)-zF,y{r) 



i [_ R,A nm 

R 4 nm 



(8) 

(9) 
(10) 



Here, ^^^ = / Jr2^„(r). From eqs. 
coefficients b^ with A'^ electrons is: 
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Here, D^^=D%, + JVeffir)Q^Jr)dr, K//(r) = y^(r) +y,,(r) +v'"'-(r) 'ZF, and Lr.^ Z^ = A^. 
The chemical potential of the system is /i , and Z^ is the atomic number of atom A, or the electron 
number of functional group A at position R. After obtaining the gradients {3^}, the P^^z gradients 
with respect to LCAP coefficients can be computed using eq. [51 As described in Ref.^*^, with the 
availability of gradients, continuous optimization can be carried out effectively. 

Our studies with expanded molecular fragment libraries showed that the property surfaces can 
become too complicated to permit efficient hyperpolarizability optimization in the continuous 
space. Indeed, for the specific NLO properties, potentials associated with non-integer LCAP coef- 
ficients can have large and rapidly varying hyperpolarizabilities as electron numbers are varied and 
as electronic states move in and out of degeneracy. This challenge presented by adding richness to 
the LCAP library suggests that we could modify the optimization method in a way that retains the 
power and appeal of the LCAP approach while side-stepping the ruggedness associated with the 
non-integer (not realizable, or "intermediate") molecular structures. In the following section, we 
present the development in which the GDMC and LCAP methods are combined to overcome the 
difficulties associated with the ruggedness of the property surface. 

III. THE GENERAL GDMC APPROACH 



For a given discrete space, the discrete optimization problem may be transformed into be a 
continuous one by interpolatio n ^ Oi^O i^. Then, the interpolated continuous surface can be used for 
the specific optimization problem of interest. For example, in molecular design, when one frag- 
ment A is replaced by another fragment B to form a new molecule, this transformation is indeed 
discrete. Additionally, the weighting of each fragment at each site is defined by one coefficient. 



When the molecule contains A, the coefficient of A is set to 1; otherwise it is set to 0. The set of 
coefficients (either or 1) corresponds to one real molecule. Thus, normal integer programming 
algorithms such as Monte Carlo and genetic algorithms can be applied to search in the discrete 
space. However, we can also allow A to change continuously to B, which means the A coefficient 
is decreased from 1 to continuously and the B coefficient is increased from 1 to 0. It is then pos- 
sible to use gradients with respect to the coefficients to aid the molecular property optimization. 
Mathematically, the LCAP coefficients discussed in Section |ll] play the same roles in the potential 
optimization to transform a discrete problem into a continuous one. The possible transformations 
are not unique. Some approaches may simplify the optimization problem while others may worsen 
the problem, depending on the property and the specific molecular fragment libraries. In the LCAP 
approach, when the coefficients are between and 1, the system becomes an "intermediate" or al- 
chemical species. It may lead to non-smooth behavior of the molecular property in the unphysical 
intermediate states. Such a scenario was encountered in the LCAP continuous optimization due to 
the fractional number of electrons when the NLO properties were targeted for optimization. We 
have found that the system with one fractional electron is more likely to have a much higher non- 
linear polarizability than the corresponding systems with an integer number of electrons because 
the NLO property is quite sensitive to the electronic states. Thus, it may be particularly ineffective 
to optimize NLO properties for diverse libraries on an interpolated continuous surface. 

Although the interpolated continuous surface may be rugged, the local gradient information 
for each set of coefficients represents approximately how the property varies when the coefficients 
change. For example, as demonstrated in our previous studies^, calculated gradients for the real 
molecule were used to build the next molecule and to guide the search efficiently. But the search 
can be easily trapped in local optima. Therefore, we develop a new strategy, the gradient-directed 
Monte Carlo (GDMC) approach, to deal with the global optimization of discrete variables. GDMC 
utilizes the local gradients to enhance the search for local optima, while the stochastic jumps over- 
come the barriers between local optima. The details of a GDMC calculation for the minimization 
of a general property P consist of the following steps : 

1. Set the iteration number / = and begin with an initial set of coefficients {b^} (either or 
1) at each site. 



2. Set i = i+1. Calculate the property Pi and its gradients -^ with respect to all coefficients 

""a 
{b'^} at each site. Exit if the optimization goal or the maximum number of iteration is 

reached. 

3. Make a trial move to generate a new set of {b'^} following the computed gradients {-^fr}- 



Accept the trial move with the probability p = min{\, e ^^'^' ^'i)} using the Metropolis 
rule. If the trial move is accepted, go to step 2; If the trial move is rejected, go to step 3 . 

Here, j8 = \/{kT). On the energy surface, k is Boltzmann's constant and T is the temperature. 
However, on the property surface, k and T are two parameters: the value of k is constant; T is an 
empirical temperature that controls the probability of accepting a set of coefficients. The rules for 
generating new trial molecules following the LCAP gradients are described in the next section. 

Compared to the classical MC method, GDMC does not randomly vary the coefficients. All 
of the coefficients (either or 1) that describe the discrete space are generated from the property 
gradients that can be defined in many ways. As shown in Section |IVl we redefined the LCAP 
gradients associated with GDMC to optimize the NLO properties in three different examples. It is 
also worth noting that the GDMC approach is suitable for any discrete optimization that possesses 
a smoothed-out virtual continuous surface. 

IV. COMBINING GDMC AND LCAP 

We use the absolute value of the first hyperpolarizability as the object function in the GDMC 
optimization. For the LCAP approach, the gradients {^fw} (£<!• 13) include the chemical potential 

/i = ( 1^ j (see eq. [T3]) . which has two values for each of the directions of change in the number of 
electrons and can be calculated based on the formula recently derived for any DFT calculations.— 
Here, we found it possible to bypass the need to calculate /i. We redefine the LCAP gradients by 
a normalization. 
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where Z^ is the total number of electrons for atom or functional group A at position R. Thus, all 
gradients are simply shifted by a constant value, defined in the last term of eq. [Ml that comes from 
the chemical potential. Because this term does not change the ordering of the gradient values, it is 
set to zero. During the GDMC optimization, one molecule is generated at a time and the gradients 
{-^^ } ^e calculated for that structure using eq. [HJ [I2l [IH [5] and [131 These gradients are then 
used to generate the next molecule. 



How the calculated gradients are used to obtain the next molecule within the LCAP framework 
is critical. The main idea is to use the LCAP gradients to choose the most promising fragment at 
each site. This scheme is best explained with a simple example for a framework with two fragment 
sites shown in Fig. \T\ Each site has a library of four possible fragments. The current molecule is 
AlBl as seen in Fig. [TJ The LCAP gradients {-^^ } are calculated for each fragment. For each 
site, all four functional groups are sorted in descending order according to their negative gradient 
values ({ — -^^ }). In Fig. [H the order after sorting becomes A3 > Al > A2 > A4 for site A, and 
B2 > B4 > Bl > 53 for site B. Based on the ordering at each site, the next molecule is A3B2. If 
A3B2 has been visited before, one of the sites is chosen at random to undergo further mutation. As 
shown in Fig. [H fragment A3 at site A is replaced by fragment A 1 (the next highest ranking frag- 
ment) and the new molecule A 152 is generated. This procedure is repeated until a new molecule is 
obtained. However, because ab-initio calculations of the first hyperpolarizabilities are expensive, 
each design site will be mutated only once. As such, if a new molecule cannot be generated after 
several trial mutations or if the maximum number of iterations has been reached, the optimization 
stops. Using this procedure to generate a new molecule according to the normalized LCAP gradi- 
ents, it is straightforward to implement the GDMC algorithm described in Section|nI]to optimize 
NLO properties. The new GDMC-LCAP approach is tested in Section El 

In molecular design, the geometry changes of the target system have important effects on the 
target properties. In the current work, since we focus on optimizing the first hyperpolarizabilities 
of the extended ;r-electron structures with donors and acceptors, all of the candidate molecular 
structures are rigid. The substitutions at design sites do not disrupt the ;r-conjugation. Therefore, 
during the GDMC optimizations for three cases in|Vl the framework geometries were fixed. 

V. GDMC-LCAP OPTIMIZATION OF THE FIRST HYPERPOLARIZABILITY 

Extended ;r-electron structures with donors and acceptors are well known to have large first 
hyperpolarizabilities^2^>^>2^>22i22ii22ii2Mii22iiMiM;2^i2^i2I^ and their response properties are well un- 
derstood in the context of both simple models and more involved quantum chemical calculations^. 
Thus, our studies focus on donor- ;r-acceptor frameworks and one dominant tensor element, jS^^^, 
is computed and optimized. To explore the efficiency of the GDMC-LCAP approach, we opti- 
mized the absolute value of the first hyperpolarizability for three different ;r-electron conjugated 
scaffolds. The size of the cubic box in PWSCF^^ was 22 x 22 x 22 au^ in Case I, 30 x 30 x 30 au^ 
in Case II, and 60 x 60 x 60 au^ in Case III. The energy cutoff in the PWSCF calculation was 15 
Ryd.^^ The step size of the electric fields in Case I was O.OOSau and was set to 0.0025aM in Cases 



II and III (the smaller step size was used because of convergence issues during the SCF calcu- 
lations). The functional used was LDA-^^ and the maximum number of iterations was 30 except 
Case III, where it was 40. The empirical temperature parameter was also studied in three cases. 

A. Case I 

We first used a six-membered ring system to test the GDMC-LCAP optimization approach. The 
structural framework is shown in Figl2l The geometry was taken from benzene (rc-c = 1-390, 
rc-H = 1 -095) and was held fixed during the optimization. Either a CH group or A^ atom could 
be placed at each site, producing 64 possible structures (many are identical due to the rotational 
symmetry). The structures can also be enumerated to determine the largest P^zz value. From the 
enumeration, structure (b) in Fig. [3] was found to have the largest Pzzz value. 

The GDMC-LCAP optimization began with the initial molecule, structure (a) in Fig. [31 and 
the optimization profile is shown in Fig. [3l Five runs with different initial molecules were tested 
as well. They all yielded similar search profiles and produced structure (b). In Fig. [3l three 
degenerate molecules that had the same structure (b) (due to symmetry) were found. Since the 
initial structure (a) has a relatively large Pzzz value, the optimization becomes trapped in this local 
optimum and the Monte Carlo stochastic moves were helpful in generating the next molecule 
and overcoming the barrier between local optima. From the 6th molecule in the profile, the LCAP 
gradients guided the molecular generation quite efficiently. In molecules 7—10 generated from the 
LCAP gradient information, the absolute value of Pzzz increased until the maximum value of jS^^ 
was reached. The same behavior, led by local optimization based on the LCAP gradients, is also 
observed in the later steps shown in Fig. [3l The total number of molecular property calculations 
was only 26, and 14 jumps between molecules based on the LCAP gradients showed increased 
Pzzz values. This indicates that the LCAP gradients indeed helped generating new molecules with 
increased property values. 

B. Case H 

In Case II, a more complicated family of structures was explored. The structural framework 
is shown in Fig. |4l The six-member ring geometry is the same as in Case I. The geometries of 
the functional groups were generated using the Spartan Builder^^. Either an H, CHt,, NO2 or NH2 
fragment was placed at sites 1 and 2. Either a CH or N was placed at sites 3 through 6. Sites 1 and 
2 are located on the z-axis. By enumerating all 256 possible structures, molecule (b) was found to 
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have the largest fizzz value, shown in Fig. \5\ 

In Case II, at sites 1 and 2, the four functional groups have different numbers of electrons. 
When the continuous LCAP optimization is implemented, we observed that some "intermediate" 
molecules with a fractional number of electrons had large P^zz values. The fractional number of 
electrons is unphysical and makes the continuous surface rugged. However, in the GDMC-LCAP 
optimization, only the properties of real molecules are computed and the LCAP gradients direct 
the jumping from one molecule to another. Since this surface is more complicated, we tested 
the optimization using three different temperature parameters: T = IK, 50K, and 100^. All the 
optimizations began with the same initial molecule (a), shown in Fig. [5l and the optimization 
profiles are shown in Fig. \5\ 

All the GDMC-LCAP optimizations found the same molecule (b). When T = IK, only 17 
molecules were computed and the best molecule was found in 2 jumps, indicated by the solid 
black line of Fig. [5] using the ordering of the LCAP gradients defined in Section |IVl After the 
properties of 17 molecules were calculated, the optimization became trapped because of the low 
temperature. When the temperature was increased from 1^ to 50^, the GDMC protocol quickly 
jumped out of the local optimum around the initial molecule (a) in the beginning of the optimiza- 
tion, as shown by the dotted red line in Fig. [5l The largest jS^^^ molecule was obtained in the third 
property calculation. As the optimization continued at the higher temperature, GDMC generated 
new molecules that were more likely to be accepted based on their property values. The other 
degenerate molecule (b) was obtained in the eleventh property calculation. When the temperature 
was increased further to 100^, the profile showed similar behavior. Basically, when T varies from 
1^ to 100^, GDMC has more flexibility to jump out of local optima and explore the discrete space 
more broadly. 

C. Case ni 

In Case III, a large porphyrin substituted donor-;r-acceptor framework was used (see Fig. 
[6l). The geometry was taken from molecule (c) in Fig. |7] that had been optimized at the 
B3LYP/6 — 31 + G* level using Gaussian03.— The geometries of all donors and acceptors were 
generated using the Spartan Builder^^. At site A, one of three acceptor groups (NO2, CN or COH) 
was placed. At site D, one of three donor groups {N{CHj)2, CHj, or OCH3) was placed. Sites 1 
through 10 have either a CH or A^ unit. This family of structures includes a total of 9,216 possible 
molecules. Performing ab initio first hyperpolarizability calculations on all of the possible struc- 
tures is costly. Thus, we cannot enumerate all of the structures to find the one with the largest P^zz 
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value. 

The initial molecule (a) consists of a normal porphyrin ring with a C//3 at site D and a CN at 
site A that were chosen randomly from four possible donors and four acceptors. Based on jS^^z 
calculations for several initial molecules, the ^^zz values vary from thousands to tens of thousands 
(in atomic units). Thus, we executed three GDMC optimizations at three different temperatures 
beginning with the initial molecule (a) shown in Fig. |71 The maximum number of iterations was 
40. All of the optimization profiles are shown in Fig. [H Two of the trajectories were trapped in 
local optima because the surface is rugged. The optimizations at T = 300^ and T = 500K were 
stuck after less than 25 molecules were analyzed. At T = 800^, GDMC jumped out of the local 
optimum and found one better molecule (b), shown in Fig. U\ However, we further tested five 
trial runs beginning from five different random initial molecules at T = 300^. The most favorable 
molecule of all trial runs was molecule (b), shown in Fig. U\ This suggests that for much more 
rugged surface, the optimization is easily trapped when T is not sufficiently high. Hence, for the 
low temperature such as T = 300K in this case, several independent optimizations with different 
random initial molecules allow more broad explorations of the molecular space. In addition, even 
for this rugged surface, the local gradient information from LCAP assists in jumping from one 
molecule to another with a higher P^vy value. For example, at T = SOOK, the P^^ value was 
increased from molecules 21 to 25 in the optimization profile. 

The obtained molecule (b) shown in Fig. |7] contains a new type of porphyrin-like ring. The 
origin of its high jS^^, becomes an interesting question. We speculate that, because nitrogen has 
more electrons than carbon, the four additional nitrogen atoms present in molecule (b) increase 
the number of n electrons that are delocalized in the ring. In addition, the asymmetry of the 
broken ring facilitates polarization. Furthermore, molecule (b) has the strongest donor (N{CH^)2) 
and acceptor {N02)^ among the simple donor-acceptor structures. For these reasons, we predict 
that molecule (b) of Fig. [7] has a larger jS^^^ value than those with conventional porphyrin cores. 
Molecule (b) may represent a global optimum or a near optimal structure. 

VI. CONCLUSION 

Although the original LCAP approach was demonstrated to optimize the first hyperpolarizabil- 
ity of simple systems efficiently^, it suffers from the rugged surfaces when LCAP is applied for 
more complicated molecular libraries. During the property optimization, some "intermediate" or 
"alchemical" molecules were obtained with unphysically large first hyperpolarizabilities because 
of an unphysical fractional number of electrons. However, the LCAP gradients of the property 
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with respect to the LCAP coefficients are useful in directing the search for candidate molecules. 
To utilize these LCAP gradients and to avoid analyzing "intermediate" molecules, we developed 
a gradient-directed Monte Carlo (GDMC) method that can be used in discrete optimization. After 
making the discrete space continuous, the property gradients with respect to the discrete variables 
can be calculated. The gradients are then used to generate a new set of discrete variables. The 
stochastic Monte Carlo moves assist in jumping out of local optima. Specifically, the LCAP gra- 
dients computed from one real molecule are normalized and utilized to generate a new structure. 
Therefore, the combined GDMC-LCAP approach was used to optimize the first hyperpolarizabil- 
ity for three different structural frameworks. 

For Cases I and II, several runs of the GDMC-LCAP approach always found the most favorable 
structure that was validated by the exhaustive enumeration of all possible molecules. The jumps 
from one molecule to another based on the LCAP gradients have greater likelihood to increase 
the molecular first hyperpolarizability. The temperature used in GDMC-LCAP is an empirical 
parameter. When T is sufficiently high, the optimization stops only if the maximum number of 
iterations is reached, for example, see Fig. [5]for Case n when T = 50K and T = lOOK. When T is 
too low, the optimization is trapped after several trial mutations, for example, see Fig. [5] for Case 
II when T = IK . To explore the molecular space more broadly when T is low and ensure that 
the optimal property is found, several optimizations beginning from random initial molecules may 
be required. For example, in Case III, five trial runs beginning from five different random initial 
molecules at T = 300K were performed and structure (b) in Fig. |7] was obtained in all five runs. 
Furthermore, if GDMC is combined with heating, cooling, or annealing protocols, GDMC will be 
more robust and irrespective of the initial guess. 

In Case III, the P^^y value of structure (b) in Fig. |7] obtained during the GDMC optimizations 
is increased almost 20% compared to the initial seed molecule (a) in Fig. |71 In particular, the 
optimized structure has the strongest donor and acceptor groups at the D and A sites and a n- 
electron ring with four nitrogen atoms more than in a porphyrin. Thus, we predict that structure 
(b) in Fig. |7]is the optimal or nearly optimal structure. 

To compare the GDMC method with other approaches such as the classical Monte Carlo (MC) 
method and a genetical algorithm (GA), we recently studied an HP model^"^"^ for protein se- 
quence design and protein folding.'^ Those results indicate that the GDMC approach is much more 
efficient than the classical MC or GA optimization, because the property gradients with respect to 
the discrete variables, combined with the Monte Carlo moves, assist in jumping out of local optima 
and produce a greater likelihood of generating a new set of discrete variables with an enhanced 
property value. 
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When the geometry changes of frameworks are involved during the property optimization, 
the advantage of local searching directed by property gradients may not be clear. However, in a 
recent work, GDMC was employed to optimize protein sequence with concurrent protein structure 
optimization.— It shows that GDMC is still more efficient than the classical MC. 

In summary, the combined GDMC-LCAP approach provides an effective strategy for finding 
molecules with optimized molecular properties. The GDMC method can be used to explore other 
discrete spaces as well as the LCAP space. 
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Figure [B Generation of a new molecule using LCAP gradients for GDMC optimization. Here, 
only two sites are considered. At each site, there are four possible fragments (A1,A2,A3,A4 for 
site A , and 5 1 , 52 , 53 , 54 for site 5) . A gray box indicates that the fragment is present in the current 
molecule. Based on the LCAP gradients calculated for molecule A 151, all of the fragments at each 
site are reordered and the next molecule A352 is generated. If A352 has been visited before, one 
site is randomly chosen to undergo mutation. In this example, site A is chosen and fragment A3 is 
mutated to fragment A 1 (the next highest ranking fragment) and A 152 is generated. This procedure 
is repeated until the maximum number of iterations is reached or no more new molecules can be 
generated. 

Figure [2l Framework for Case I. Either a CH or A'^ can be placed at each site. 64 possible 
structures exist, without considering symmetry equivalent structures. 

Figure[3l Optimization profile for Case I (T=1K). The x-axis indexes the calculated molecules 
during the optimization; the y-axis represents the absolute value of jS^^z' where empty circles 
denote negative molecular P^^^ values and solid circles denote positive values. The initial molecule 
is structure (a). Three degenerate molecules with structure (b) and the largest jS^^ value were 
obtained. During the optimization, the LCAP gradients guided the generation of new molecules 
and resulted in a greater likelihood of generating a new molecule with a higher property value. 
The Monte Carlo random moves in the initial stage assisted in jumping out of the local optimum. 

Figure m Framework for Case II. At sites 1 and 2, either an H, CH^, NO2 or NH2 was placed. 
Either a CH or A^ was placed at sites 3 to 6. Sites 1 and 2 are located on the axis z. 256 possible 
structures exist, without considering symmetry equivalent structures. 

Figure [5l Optimization profiles for Case II with three different temperature parameters. The 
largest IP^zzl molecule (b) was found in all optimizations. When T is increased from 1^ to 50^ to 
100^, the other chemically identical molecule (b) was also found, suggesting that the optimization 
results do not depend critically on the temperature parameter. 

Figure [61 Framework for Case III. At site A, one of three acceptor groups {NO2, CN or COH) 
was placed. At site D, one of three donor groups {N{CH^)2, CHt, or OCHt,) was placed. At sites 
1 through 10, either a CH or N was placed. 9,216 possible molecules exist, without considering 
symmetry equivalent structures. 

Figure |71 (a) Initial molecule in Case III. (b) Molecule with the largest jS^^z value after four 
GDMC optimizations were carried out using three different temperature parameters, (c) Molecule 
with a normal porphyrin ring and the strongest donor and acceptor groups in Case III. 

Figure [8l Three optimization profiles for Case III with three temperatures ranging from 300K 
to 800^. Pzzz values are always positive due to the donor- ;r- acceptor framework. The rugged 
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surface trapped three optimizations when T = 300 A' and 500^. At T = 800^, GDMC jumped out 
of the local optimum and yielded molecule (b), shown in Fig. |71 
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